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Abstract — In this paper, we investigate the optimal precoding 
scheme for relay networks with finite-alphabet constraints. We 
show that the previous work utilizing various design criteria 
to maximize either the diversity order or the transmission rate 
with the Gaussian-input assumption may lead to significant loss 
for a practical system with finite constellation set constraint. A 
linear precoding scheme is proposed to maximize the mutual 
information for relay networks. We exploit the structure of the 
optimal precoding matrix and develop a unified two-step iterative 
algorithm utilizing the theory of convex optimization and opti- 
mization on the complex Stiefel manifold. Numerical examples 
show that this novel iterative algorithm achieves significant gains 
compared to its conventional counterpart. 

L Introduction 

Cooperative relaying is an emerging technology, which 
provides reliable high data rate transmission for wireless 
networks without the need of multiple antennas at each node. 
These benefits can be further exploited by utilizing judicious 
cooperative design (see |[l]-|l5l and the references therein). 

Most of the existing designs optimize the performance of 
relay networks with Gaussian-input assumptions, for example, 
maximizing output signal-to-noise (SNR) lU, 121, minimizing 
mean square error (MSE) |T|, f3l and maximizing channel 
capacity 0), H, ||5|. Despite the information theoretic op- 
timality of Gaussian inputs, they can never be realized in 
practice. Rather, the inputs must be drawn from a finite 
constellation set, such as pulse amplitude modulation (PAM), 
phase shift keying (PSK) modulation and quadrature amplitude 
modulation (QAM), in a practical communication system. 
These kinds of discrete constellations depart significantly from 
the Gaussian ideaUzation ||6l, Q. Therefore, there exhibits a 
big performance gap between the scheme designed with the 
Gaussian-input assumption and the scheme designed from the 
standing point of finite-alphabet constraint lU. 

In this paper, we consider the two-hop relay networks with 
finite-input constraint and utilize linear precoder to improve 
the maximal possible transmission rate of networks. We ex- 
ploit the optimal structure of the precoding matrix under 
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finite-alphabet constraint and develop a unified framework to 
solve this nonconvex optimization problem. We prove that the 
left singular matrix of the precoder coincides with the right 
singular matrix of the effective channel matrix; the mutual in- 
formation is a concave function of the power allocation vector 
of the precoder; the optimization of the right singular matrix 
with unitary constraint can be formulated as an unconstrained 
one on the complex Stiefel manifold. Once these important 
results are provided, the optimal precoder is solved with a 
complete two-step iterative algorithm utilizing the theory of 
convex optimization and optimization on the manifold. We 
show that this novel iterative algorithm achieves significant 
gains compared to its conventional counterpart. 

Notation: Boldface uppercase letters denote matrices, bold- 
face lowercase letters denote column vectors, and italics denote 
scalars. The superscripts (•)^ and (•)^ stand for transpose 
and Hermitian operations, respectively. The subscripts Ci and 
[A]^^ denote the z-th element of vector c and the (i,j)-th 
element of matrix A, respectively. The operator Diag(a) 
represents a diagonal matrix whose nonzero elements are given 
by the elements of vector a. Furthermore, vec (A) represents 
the vector obtained by stacking the columns of A; I and 
represents an identity matrix and a zero matrix of appropriate 
dimensions, respectively; Tr (A) denotes the trace operation. 
Besides, all logarithms are base 2. 

II. System Model 

Consider a relay network with one transmit-and-receive pair, 
where the source node attempts to communicate to the des- 
tination node with the assistance of k relays (ri, r2, • • • , r^). 
All nodes are equipped with a single antenna and operated 
in half-duplex mode. We consider a flat fading cooperative 
transmission system, in which the channel gain from the 
source to the destination is denoted by ho, whereas those 
from the source to the i-th relay and from the i-th relay to 
the destination are denoted as hi and gi, respectively. 

We focus on the amplify-and-forward protocols combined 
with single relay selection |5|. The signal transmission is 
carried out by blocks with block length 2L, L > 1. For the 
selected relay node, there is a data receiving period of length 
L before a data transmitting period of length L. 

The original data at the source node is denoted by 

FT Tl ^ 
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I(x;y) = logA/ 



m— 1 



E„ <^ log exp [-IIHP (x™ - Xfe) + n||2 + ||n||2] 



(3) 



fc=i 



E = I 



Efir exp (-||y - HPx,|p)l W'^^^l xf exp (-||y - HPx,|p 



E™riexp(-||y-HPx„,|P) 



-dy. 



(9) 



where Xq = [xi,--- ,xlY', x;, = - ,X2lV with 

being the data symbol at the /-th time slot, I — 1, • • • , 2L. 
We assume that the original information signals are equally 
probable from discrete signaling constellations such as PSK, 
PAM, or QAM with unit covariance matrix, i.e., E [xx-^] — I. 

The original data is processed by a precoding matrix before 
being transmitted from the source node. The precoded data 
s = [s^ s^] is given by s = Px, where P is a generalized 
complex precoding matrix. 

The source node sends the signal v^Sq with power con- 
straint Ps during the first i-time slots. Let y^ and y^ be 
received signals at the i-th relay node and the destination, 
respectively, which are given by 



Ya = VPshoSa + Ua, 



where and ria denote, respectively, the independent and 
identically distributed (i.i.d.) zero-mean circularly Gaussian 
noise with unit variance at the i-th relay and the destination. 

Assuming the i-th relay node is selected at the second 
i-time slots, it scales the received signal by a factor b (so 
that its average transmit power is Pr) and forwards it to the 
receiver. We assume only the second-order statistics of hi 
is kno wn at the i-th relay node, then b can be chosen as 

b = ^ LPr/Tv (E [y^yf ] ) . At the second L-time slots, the 
source node sends the signal -y/^s;,. Therefore, the destination 
node receives a superposition of the relay transmission and the 
source transmission according to 

yb = bgiYi + \/p'shoSb + nt 

= y/P^bhigiSa + y^hoSb + Ue, 

where n;, denotes the noise vector of the destination at the 
second L-time slots, and ng denotes the effective noise ~ 

CAfiO,Ndl) with Nd^l + 62|g,|2. 

For convenience in the presentation, we normalize yf, by 
w — l/\/Nd and denote the received signal vector as y = 
[yj wy^] . Thus, the effective input-output relation for the 
two-hop transmission with precoding is summarized as 



y = Hs + n = HPx + n, (1) 

where x is the original transmitted signal vector; n — 
[n^^ wn^] is i.i.d. complex Gaussian channel noise vector 
with zero mean and unit variance, i.e., n ~ CA/^(0,I); H is 
the effective channel matrix of the two-hop relay channel 

/lol 
wbhiQil whol 

Our precoding scheme is thus the design of matrix P to 
maximize the mutual information with finite-alphabet con- 
straints. Note that for the proposed algorithm to be effectively 



H = x/R 



(2) 



implemented in practice, the destination estimates effective 
channel matrices of relay networks through pilot assisted chan- 
nel estimation. Then the destination node selects one relay for 
cooperation and provides the corresponding effective channel 
to the source node via a feedback channel. Considering the 
special structure of the channel matrix the amount of the 
feedback can be very small. After signal feedback, the source 
node utilizes the proposed precoding algorithm to optimize the 
network performance. 

III. Mutual Information for Relay Networks 

We consider the conventional equiprobable discrete con- 
stellations such as A/-ary PSK, PAM, or QAM, where M is 
the number of points in the signal constellation. The mutual 
information between x and y, with the equivalent channel 
matrix H and the precoding matrix P known at the receiver, 
is I(x;y) given by ([3]l, where || • || denotes Euclidean norm 
of a vector; both x™ and x^ contain 2L symbols, taken 
independently from the M-ary signal constellation Q, IS). 

Proposition 1: Let U be a unitary matrix, and the following 
relationships hold: 



I(x;y)=I(x;Uy) 
T(x;y)^I(Ux;y). 



(4) 
(5) 



Proof: See proof in |9 |. □ 
Proposition [T] implies that the property of mutual informa- 
tion for the discrete input vector is different from the case of 
Gaussian inputs. For Gaussian inputs, the mutual information 
is unchanged when either transmitted signal x or received 
signal y is rotated by a unitary matrix. The case of finite 
inputs does not follow the same rule. Therefore, it provides 
us a new opportunity to improve the system performance. 

The optimization of the linear precoding matrix P is carried 
out over all 2L x 2L complex precoding matrices under 
transmit power constraint, which can be cast as a constrained 
nonlinear optimization problem 



maximize I (x; y) 

subject to Tr {E [ss^] } = Tr (PP^) < 2L 



(6) 



Proposition 2 (Necessary Condition): The optimal precod- 
ing matrix P* for the optimization problem (|6]l satisfies the 
following condition [7|, |8| 



^P* = H^HP*E, 



(7) 



where ^ is chosen to satisfy the power constraint, and E is 
the minimum mean square error (MMSE) matrix given by 

Proof: See proof in ||9l. □ 
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It is important to note that Proposition |2] gives a necessary 
condition satisfied by any critical points, since the optimization 
problem (|6|l is nonconvex. It is possible to develop an algo- 
rithm based on the gradient of the Lagrangian as proposed in 
IHl, flO|. However, this kind of algorithms can be stuck at a 
local maximum, which is influenced by the starting point and 
may be far away from the optimal solution. This fact will be 
shown via an example in the sequel. 

IV. Precoder Design to Maximize the Mutual 
Information 

A. Optimal Left Singular Vector 

We start by characterizing the dependence of mutual infor- 
mation I (x; y) on precoder P. Consider the singular value 
decomposition (SVD) of the 2L x 2L channel matrix H = 
UHDiag(cr)Vy, where Uh and Vh are unitary matrices, 
and the vector cr contains nonnegative entries in decreasing 
order Note that the equivalent channel matrix H defined in (|2]i 
is full rank for any nonzero channel gain h^. We also consider 
the SVD of the precoding matrix P = UpDiag(-\/A)Vp and 
define U = Up and V = Vp, where U and V are named 
as the left and right singular vectors, respectively; the vector 
A is nonnegative constrained by transmit power 

Proposition 3: The mutual information depends on the 
precoding matrix P only through P^H^HP. For a given 
P^H^HP, we can always choose the precoding matrix of the 
form P = VHDiag(A/A)V in order to minimize the transmit 
power Tr(PP^), i.e., the left singular vector of P coincides 
with the right singular vector of H. 

Proof: See proof in f9l. □ 

From the results in Proposition [T] and [3] it is possible to 
simplify the channel matrix ([T]i to 



where R e 



p2Lx4L^ 



is a reduction matrix given by 



y = Diag((T)Diag(A/A)Vx + n. 



(10) 



Now our discussion will be based on this simplified channel 
model ( [Tol l. The optimization variables are power allocation 
vector A and right singular vector V, which are the focuses of 
the next two subsections. In the sequel, we will use X(A) and 
I(V) to emphasize the dependence of mutual information on 
variables A and V, respectively. 

B. Optimal Power Allocation 

Given a right singular vector of the precoder, we consider 
the following optimization problem over the power allocation 
vector A 

maximize 1(A) 

subject to I'^A < 2L (11) 
A ^ 0, 

where 1 denotes a column vector with all entries one. 

Proposition 4: The mutual information is a concave func- 
tion of the squared singular values of the precoder. A, i.e., the 
mutual information Hessian with respect to the power alloca- 
tion vector satisfies Hxl (A) ^ 0. Moreover, the Jacobian of 
the mutual information with respect to the power allocation 
vector A is given by 



i,j,ke[i,2L]. 



(13) 



Proof: See proof in fO). □ 
The concavity result in Proposition |4] extends the Hessian 
and concavity results in fTT, Theorem 5] from real-valued 
signal model to a generalized complex-valued case. It ensures 
to find the global optimal power allocation vector given a right 
singular vector V, and the gradient result in ( fTSl i provides 
the possibility to develop a steepest descent type algorithm to 
achieve the global optimum llT2l . 

We first rewrite the problem ( fTTI ) using the barrier method: 



2L 

minimize /(A) — ~I{X) + ^ 0(— Ai) 



[1^X-2L) 



(14) 

where is the logarithmic barrier function, which approx- 
imates an indicator illustrating whether constraints are violated 



-(lA)log(-u), u<Q 
+0O, u > 



(15) 



Va2:(A) = R • vec (Diag^(cr)VEV^ 



(12) 



with the parameter t > setting the accuracy of the approxi- 
mation. The gradient of objective function (fT4l) is 

Va/(A) = -R.vec (Diag2(^)VEV^)-i (^q - —I-^^ 

(16) 

where qi = l/A; is the z-th element of vector q. Therefore, 
the steepest descent direction is chosen as 

AA=-Va/(A). 

Then it is necessary to decide a positive step size 7 e M so that 
/(A + 7AA) < /(A). The backtracking line search conditions 
1 12] states that 7 should be chosen to satisfy the inequalities 

/(A)-/(A + 7AA)> i7||AA||2, (17) 
/(A)-/(A + 27AA) <7||AA||2. (18) 

The above ideas can be summarized as the following al- 
gorithm, which ensures to converge to the optimal power 
allocation vector because of the concavity. 

Algorithm 1: Steepest Descent to Maximize the Mutual 
Information Over Power Allocation Vector 

1) Given a feasible A, t := t^"' > 0, a > 1, tolerance 
e > 0. 

2) Compute the gradient of / at A, Va/(A), as ( fT6b and 
the descent direction AA = — Va/(A). Set the step size 
7:= 1. 

3) Evaluate ||AA|p. If it is sufficiently small, then go to 
Step HI 

4) If /(A) - /(A + 27AA) > 7II AA||2, then set 7 := 27, 
and repeat Step |4] 

5) If /(A)-/(A + 7AA) < i7||AA||2, then set 7 \-f, 
and repeat Step |5] 

6) Set A A + 7AA. Go to Step|2l 

7) Stop if 1/i < e, else t -.^ at, and go to step|2l 
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C. Optimization Over Right Singular Vector 

This section considers an alternative optimization problem 
for maximizing the mutual information over the right singular 
vector V for a given power allocation vector, 

maximize I(V) 

subject to V^V = VV^ = I. ^ ' 

This unitary matrix constrained problem can be formulated as 
an unconstrained one in a constrained search space 

minimize ^(V) (20) 

where we define the function ^(V) as — I(V), and with 
domain restricted to the feasible set: 

dom.g = {V e St(n)}, (21) 

in which the set St(n) is complex Stiefel manifold llT3l 

St(n) = {V € C"^"|V^V = 1} . (22) 

Associated with each point V e St(n) is a vector space 
called tangent space, which is formed by all the tangent 
vectors at the point V. 

Proposition 5: The gradient of the mutual information on 
the tangent space is 

Vv3(V) = -Diag^(cr)Diag(A)VE 

+ VEV^Diag^(cr)Diag(A)V. (23) 

Proof: See proof in Jg). □ 
Utilizing the gradient on the tangent space as the descent 

direction has been suggested in |13|, i.e., AV — — Vv5(V); 

however, moving towards the direction on the tangent space 

may lost the unitary property. Therefore, it needs to be restored 

in each step via projection. 

The projection of an arbitrary matrix W g C"^" onto the 

Stiefel manifold 7r(W) is defined to be the point on the Stiefel 

manifold closest to W in the Euclidean norm 

7r(W)=arg min ||W-Q|p. (24) 
Qest(n) 

Proposition 6 (Projection): Let W G C"X" be a full rank 
matrix. If the SVD of W is W = UwSVw, the projection 
is unique, which is given by 7r(W) = UwVw- 

Proof: See proof in ||9l- □ 

Combining the search direction and projection method with 
the line search conditions in ( fTTI i and ( fTsT i, we are able to 
develop the optimization algorithm to maximize the mutual 
information over the right singular vector V. 

Algorithm 2: Steepest Descent to Maximize the Mutual 
Information on Complex Stiefel Manifold 

1) Given a feasible V e C"^" such that V^V = I. 

2) Compute the gradient of g at V, Vv3(V), as ( |23] | and 
the descent direction AV ~ — Vv.g(V). Set the step 
size 7 := 1. 

3) Evaluate || AVjp = Tr{(AV)^AV}. If it is sufficiently 
small, then stop. 

4) If g(V) - g(7r(V + 27AV)) > 7II AVjp, then set 7 
27, and repeat Step |4] 

5) If g(V) -5(7r(V + 7AV)) < i7||AV||2, then set 7 
■^7, and repeat Step |5] 

6) Set V := V + 7AV. Go to Step|2l 
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Fig. 1. Evolution of the mutual information as the lineal' precoder is 
iteratively optimized with the two-step algorithm and gradient algorithm. 

D. Two-Step Approach to Optimize Precoder 

Now we are ready to develop a complete two-step approach 
to maximize the mutual information over a generalized pre- 
coding matrix P via combining Proposition [3] and Algorithm 
□ andE 

Algorithm 3: Two-Step Algorithm to Maximize the Mutual 
Information Over a Generalized Precoding Matrix 

1) Set the left singular vector of the precoder U := Vh, 
and give a feasible A and V. 

2) Update power allocation vector: Run Algorithm [T] given 
V. 

3) Update right singular vector: Run Algorithmic given A. 

4) Go to Step |2] until convergence. 

V. Applications 

We consider a single-relay network with the block length 
L = 1 and the channel coefficient ho = 0.4, hi — 1.2 
and gi = — 0.9j. We assume the same transmit power at the 
source and relay node (i.e., Pg = P,- = P), and the SNR 
is 3 dB. when the elements of the transmitted signal x is 
drawn independently from BPSK constellations, the mutual 
information is bounded by 1 bit/s/Hz as shown in (O. 

The convergence of the proposed approach is illustrated in 
Fig. [U We also show the convergence of algorithms proposed 
in Is), ifTol . From Fig. [H it is shown that the direct gradient 
method is stuck at a local maximum (0.53 bit/s/Hz). The 
reason for such performance is that the optimization problem 
is not convex in general. In contrast, the proposed two-step 
algorithm exploits the characterization of the optimal solution, 
which leads to a solution with the optimal left singular vector, 
the optimal power allocation vector (for a given right singular 
vector), and the local optimal right singular vector from an 
arbitrary start point. Hence, the algorithm is able to converge 
to a much higher value (0.85 bit/s/Hz) with about 60 percent 
improvement. Note that the progress of the proposed method 
has a staircase shape, with each stair associated with either the 
iteration for t, named as outer iteration in 1121 . or the change 
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Fig. 2. Mutual Information of relay networks with QPSK inputs. 

between the optimizations of the power allocation vector and 
the right singular vector 

The performance of the proposed algorithm is shown is Fig. 
121 in which the information symbol x is modulated as QPSK, 
and the channel is the same as the above case. For the sake of 
completeness, we also show the performance corresponding to 
the case of no precoding, maximum diversity design in [l4|, 
maximum coding gain design in llT4l . lITSl . and maximum 
capacity design with Gaussian inputs assumption in From 
Fig. 121 we have following several observations. 

The method based on maximizing capacity with Gaussian- 
input assumption may result in a significant loss for discrete 
inputs, especially when the SNR is in medium-to-high regions. 
The reason comes from the difference in design power allo- 
cation vector and right singular vector. For Gaussian inputs, it 
is always helpful to allocation more power to the stronger 
subchannels and less power to the weaker subchannels to 
maximize the capacity. However, this does not work for the 
case of finite inputs. Since the mutual information of the relay 
network is upper bounded by log M from ©, there is little 
incentive to allocate more power to subchannels when they are 
already close to saturation. Moreover, the right singular vector 
for Gaussian inputs is an arbitrary unitary matrix to maximize 
the capacity, because the mutual information is unchanged 
when the input signal is rotated by a unitary matrix. 

The maximum coding gain design has better performance 
than the method of maximum diversity order and no precoding. 
We should note that the maximum coding design in [14| is 
only valid for the case of block length L = 1 and QPSK 
inputs; it is extended to the case of L = 1 and 16-QAM 
inputs in ifTSll . 

The proposed two-step precoder optimization results in 
significant gain on mutual information in a wide range of SNR 
region. For example, it is about 2 dB, 4 dB and 10 dB better 
than the method of maximum coding gain, no precoding and 
maximum capacity, respectively, when the channel coding rate 
is 2/3. Moreover, this algorithm is able to be utilized for an 
arbitrary block length L and input type. 



VI. Summary and Conclusion 

In this paper, we have studied the precoding design for dual- 
hop AF relay networks. In contrast with the previous work 
utilizing various design criteria with the idealistic Gaussian- 
input assumptions, we have formulated the linear precoding 
design from the standpoint of discrete-constellation inputs. 
To develop an efficient precoding design algorithm, we have 
chosen the mutual information as the utility function. Un- 
fortunately, the maximization of this utility function over all 
possible complex precoding matrix is nonconvex, i.e., the 
direct optimization on the precoder can be stuck at a local 
maxima, which is influenced by the starting point and may 
be far away from the optimal solution. We have exploited 
the structure of the precoding matrix under finite-alphabet 
constraint and developed a unified framework to solve this 
nonconvex optimization problem. We have proposed a two- 
step iterative algorithm to maximize the mutual information. 
Numerical examples have shown substantial gains of our 
proposed approach on mutual information compared to its 
conventional counterparts. 
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